Use `curl_headers` and `curl_output` for `Livecheck` strategies. #15351

reitermarkus · 2023-05-02T22:01:58Z

Have you followed the guidelines in our Contributing document?
Have you checked to ensure there aren't other open Pull Requests for the same change?
Have you added an explanation of what your changes do and why you'd like us to include them?
Have you written new tests for your changes? Here's an example.
Have you successfully run brew style with your changes locally?
Have you successfully run brew typecheck with your changes locally?
Have you successfully run brew tests with your changes locally?

#15099 seems to be stalled, so I'm adding the curl_head usage here. Also switch to curl_output instead of directly calling curl_with_workarounds.

samford

As mentioned in #15117 (review), #curl_head is catered to the needs of CurlDownloadStrategy and isn't currently suitable for livecheck (namely, HeaderMatch).

For example, #curl_head will retry a request if a Content-Disposition header isn't present. This may be appropriate behavior for CurlDownloadStrategy but it doesn't make sense for livecheck, where we overwhelmingly check Location headers (and only a handful of those checks also provide a Content-Disposition header). Any retry behavior needs to be tied to the actual header(s) we're checking and #curl_head doesn't have a way of understanding that or controlling that behavior. As it stands, the hundreds of checks that check a Location header but don't contain responses with a Content-Disposition header would end up needlessly retrying with a GET request.

Besides that, the retry behavior is a mandatory part of #curl_head and there's currently no way to opt out or specify which request method to use (e.g., GET). There will be scenarios in the future where we specify the request method in a livecheck block (after the options feature is available) and that should effectively disable any automatic retry behavior. This would ensure that we make the correct request the first time, reducing the burden on upstream servers.

With that in mind, I'm of the opinion that the retry logic that's currently in #curl_head should be moved into CurlDownloadStrategy#resolve_url_basename_time_file_size, where it would check for a failed request or the lack of a Content-Disposition header and call #curl_head again with a GET request. This would make #curl_head a general-purpose method that could be used in livecheck. In this scenario, we would also add contextually-appropriate error-handling in livecheck, where #curl_head (or #page_headers) would be run again.

I'm not explicitly against using a general-purpose #curl_head method in Strategy#page_headers but, as mentioned, #curl_head would need some rework and we would still need some contextually-appropriate retry logic in livecheck (probably in HeaderMatch). However, I am against using #curl_head in livecheck as-is.

reitermarkus · 2023-05-03T07:19:26Z

the hundreds of checks that check a Location header but don't contain responses with a Content-Disposition header would end up needlessly retrying with a GET request.

Currently they are all using GET, potentially needlessly if HEAD would suffice.

Also, I would imagine most Locations point to a file which should have a Content-Disposition header anyways.

MikeMcQuaid · 2023-05-03T11:02:00Z

There will be scenarios in the future where we specify the request method in a livecheck block (after the options feature is available) and that should effectively disable any automatic retry behavior.

This should not block on potential future scenarios, that's not very fair on @reitermarkus.

With that in mind, I'm of the opinion that the retry logic that's currently in #curl_head should be moved into CurlDownloadStrategy#resolve_url_basename_time_file_size, where it would check for a failed request or the lack of a Content-Disposition header and call #curl_head again with a GET request. This would make #curl_head a general-purpose method that could be used in livecheck.

@samford am I right in saying this is the sole blocker, from your perspective, on this PR being merged? If so, does this seem reasonable to you @reitermarkus?

samford · 2023-05-03T13:33:53Z

Currently they are all using GET, potentially needlessly if HEAD would suffice.

I agree that we should use HEAD for #page_header requests when possible but what I'm trying to explain is that #curl_head isn't capable of only using HEAD when it's sufficient for livecheck. Due to the Content-Disposition retry logic in #curl_head, this PR would make it so that 4 out of 5 HeaderMatch checks would issue HEAD requests before doing the same GET requests that we're currently doing. HEAD would work fine in almost all of the related cases (i.e., there aren't many that need a GET request) but #curl_head will unnecessarily retry with GET because there isn't a Content-Disposition header in any responses.

Also, I would imagine most Locations point to a file which should have a Content-Disposition header anyways.

As mentioned in #15117 (review), we have ~231 checks where only a Location header is present, ~27 where both a Location and Content-Disposition are present, and ~21 that only have Content-Disposition. In practical terms, ~83% of HeaderMatch checks would issue requests with HEAD, discard them, and retry with GET due to the Content-Disposition behavior.

This is worse than the status quo because it doubles the requests and we see no benefit from it. Keep in mind that ~88% of existing HeaderMatch checks contain a Location header and we follow the redirections, so we would go from 2+ requests to 4+ requests.

This should not block on potential future scenarios, that's not very fair on @reitermarkus.

Specifying the request method in a livecheck block isn't directly related to this PR, so I shouldn't have mentioned it. I only brought it up because the changes to #curl_head that would make this PR feasible are essentially the same as what we would need to support that use case.

Am I right in saying this is the sole blocker, from your perspective, on this PR being merged?

Basically yes but Markus and I may have differing views on what #curl_head should and shouldn't do.

I agree with the idea of using HEAD requests for Strategy#page_headers/HeaderMatch and I'm fine with using a general-purpose #curl_head method in #page_headers. However, I don't agree with using #curl_head if it includes retry logic. The conditions where it's appropriate to retry the request with GET are specific to the context where #curl_head is called but #curl_head has no knowledge of that context. Currently, it uses the conditions that are appropriate for CurlDownloadStrategy, which aren't appropriate for livecheck.

From my perspective, the #curl_head changes for this to be feasible are:

Remove the retry logic from #curl_head and handle it in CurlDownloadStrategy where #curl_head is called.
Add an optional parameter to #curl_head to make it use GET instead of HEAD. This is necessary to be able to move the retry logic outside of #curl_head.
Add appropriate retry logic to HeaderMatch, since it has the context we need to understand when to retry a request with GET (#page_headers doesn't have enough context). This is basically HeaderMatch: Retry with a GET request #15099 (maybe with some small tweaks).

MikeMcQuaid · 2023-05-03T13:37:11Z

Am I right in saying this is the sole blocker, from your perspective, on this PR being merged?

Basically yes but Markus and I may have differing views on what #curl_head should and shouldn't do.

Ok, I'll take that as "yes" and @reitermarkus, assuming you're OK with that, please progress accordingly and we can ✅ and merge.

Add an optional parameter to #curl_head to make it use GET instead of HEAD. This is necessary to be able to move the retry logic outside of #curl_head.

This makes very little sense to me. The entire point of that method is to make a HEAD request, no?

samford · 2023-05-03T13:51:24Z

This makes very little sense to me. The entire point of that method is to make a HEAD request, no?

The goal of the method is to fetch/return only the headers of the response(s). head in this case refers to the response head (i.e., the headers), as opposed to the response body (the content). It's easy to conflate it with the HEAD request method, which it uses by default.

The idea of an optional parameter to use GET is in line with curl's --head option, insomuch as it will use HEAD unless --request GET is also specified.

MikeMcQuaid · 2023-05-03T14:03:55Z

The goal of the method is to fetch/return only the headers of the response(s). head in this case refers to the response head (i.e., the headers)

In that case: curl_headers feels more intuitive.

samford · 2023-05-03T14:11:17Z

The goal of the method is to fetch/return only the headers of the response(s). head in this case refers to the response head (i.e., the headers)

In that case: curl_headers feels more intuitive.

I agree. Most people looking at the method are likely to think the same thing that you did and curl_headers would avoid that.

samford · 2023-05-03T14:22:12Z

Also, just to be clear, we don't need to handle the livecheck retry logic in this PR. All that's needed here is to rework #curl_head as described above.

If we move forward with the #curl_head changes, I can rebase #15099 afterward, make a few tweaks, and we can move forward with that. The reason why I didn't rework it to use #curl_head when asked previously is because #curl_head isn't appropriate yet. I meant to explain it in that PR but was distracted with other things and forgot.

MikeMcQuaid · 2023-05-08T11:42:02Z

Looks good to me, thanks @reitermarkus!

Comments addressed.

bevanjkay · 2023-05-09T01:17:47Z

@reitermarkus I'm seeing an error on a number of livechecks undefined method success?' for nil:NilClass
It looks like the status is returning empty from the update to #curl_headers

Error: midiview: undefined method `success?' for nil:NilClass

samford · 2023-05-09T02:47:25Z

I was coming to submit my review and explain the technical issues here (I just finished collecting data, testing, etc.) but apparently didn't make it in time. As-is, this breaks all the HeaderMatch checks and that would have been clear if even one HeaderMatch check was tested.

I'll create a follow-up PR with fixes and explanations (#15389).

MikeMcQuaid · 2023-05-09T09:11:54Z

As-is, this breaks all the HeaderMatch checks and that would have been clear if even one HeaderMatch check was tested.

@samford That this doesn't happen automatically with any of our CI suites is the problem here, not that @reitermarkus missed this or that you didn't review it in time.

You need to figure out a way of automating things that doesn't rely on you catching things before they are merged. It's a reasonable assumption for Homebrew maintainers to be able to merge a PR if the code looks good and the CI looks good.

reitermarkus force-pushed the livecheck-head branch from 231edfb to 2a4cc62 Compare May 2, 2023 22:09

samford previously requested changes May 2, 2023

View reviewed changes

Use curl_head and curl_output for Livecheck strategies.

545a332

reitermarkus force-pushed the livecheck-head branch from 2a4cc62 to 545a332 Compare May 6, 2023 01:40

Rename curl_head to curl_headers.

353818f

reitermarkus enabled auto-merge May 6, 2023 01:42

reitermarkus requested a review from samford May 6, 2023 01:42

reitermarkus changed the title ~~Use curl_head and curl_output for Livecheck strategies.~~ Use curl_headers and curl_output for Livecheck strategies. May 6, 2023

MikeMcQuaid approved these changes May 8, 2023

View reviewed changes

reitermarkus merged commit b05de92 into Homebrew:master May 8, 2023

reitermarkus deleted the livecheck-head branch May 8, 2023 22:55

bevanjkay mentioned this pull request May 9, 2023

microsoft-teams 1.6.00.11156 Homebrew/homebrew-cask#146779

Merged

samford mentioned this pull request May 9, 2023

Strategy#page_headers: Update for #curl_headers #15389

Merged

7 tasks

This was referenced May 10, 2023

GitHub::API: Allow passing options to curl #15364

Closed

Strategy: Pass --max-redirs to #curl_headers #15437

Merged

github-actions bot added the outdated PR was locked due to age label Jun 9, 2023

github-actions bot locked as resolved and limited conversation to collaborators Jun 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use `curl_headers` and `curl_output` for `Livecheck` strategies. #15351

Use `curl_headers` and `curl_output` for `Livecheck` strategies. #15351

reitermarkus commented May 2, 2023 •

edited

Loading

samford left a comment •

edited

Loading

reitermarkus commented May 3, 2023

MikeMcQuaid commented May 3, 2023

samford commented May 3, 2023

MikeMcQuaid commented May 3, 2023

samford commented May 3, 2023 •

edited

Loading

MikeMcQuaid commented May 3, 2023

samford commented May 3, 2023

samford commented May 3, 2023

MikeMcQuaid commented May 8, 2023

bevanjkay commented May 9, 2023 •

edited

Loading

samford commented May 9, 2023 •

edited

Loading

MikeMcQuaid commented May 9, 2023

Use curl_headers and curl_output for Livecheck strategies. #15351

Use curl_headers and curl_output for Livecheck strategies. #15351

Conversation

reitermarkus commented May 2, 2023 • edited Loading

samford left a comment • edited Loading

Choose a reason for hiding this comment

reitermarkus commented May 3, 2023

MikeMcQuaid commented May 3, 2023

samford commented May 3, 2023

MikeMcQuaid commented May 3, 2023

samford commented May 3, 2023 • edited Loading

MikeMcQuaid commented May 3, 2023

samford commented May 3, 2023

samford commented May 3, 2023

MikeMcQuaid commented May 8, 2023

bevanjkay commented May 9, 2023 • edited Loading

samford commented May 9, 2023 • edited Loading

MikeMcQuaid commented May 9, 2023

Use `curl_headers` and `curl_output` for `Livecheck` strategies. #15351

Use `curl_headers` and `curl_output` for `Livecheck` strategies. #15351

reitermarkus commented May 2, 2023 •

edited

Loading

samford left a comment •

edited

Loading

samford commented May 3, 2023 •

edited

Loading

bevanjkay commented May 9, 2023 •

edited

Loading

samford commented May 9, 2023 •

edited

Loading